Probing the effect of OSCE checklist length on inter-observer reliability and observer accuracy
نویسندگان
چکیده
PURPOSE The Objective Structured Clinical Examination (OSCE) is a widely employed tool for measuring clinical competence. In the drive toward comprehensive assessment, OSCE stations and checklists may become increasingly complex. The objective of this study was to probe inter-observer reliability and observer accuracy as a function of OSCE checklist length. METHOD Study participants included emergency physicians and senior residents in Emergency Medicine at Dalhousie University. Participants watched an identical series of four, scripted, standardized videos enacting 10-min OSCE stations and completed corresponding assessment checklists. Each participating observer was provided with a random combination of two 40-item and two 20-item checklists. A panel of physicians scored the scenarios through repeated video review to determine the 'gold standard' checklist scores. RESULTS Fifty-seven observers completed 228 assessment checklists. Mean observer accuracy ranged from 73 to 93% (14.6-18.7/20), with an overall accuracy of 86% (17.2/20), and inter-rater reliability range of 58-78%. After controlling for station and individual variation, no effect was observed regarding the number of checklist items on overall accuracy (p=0.2305). Consistency in ratings was calculated using intraclass correlation coefficient and demonstrated no significant difference in consistency between the 20- and 40-item checklists (ranged from 0.432 to 0.781, p-values from 0.56 to 0.73). CONCLUSIONS The addition of 20 checklist items to a core list of 20 items in an OSCE assessment checklist does not appear to impact observer accuracy or inter-rater reliability.
منابع مشابه
Comparison of Singh index accuracy and dual energy X-ray absorptiometry bone mineral density measurement for evaluating osteoporosis
Background: The Singh index is an inexpensive simple method to evaluate bone density, commonly used to assess osteoporosis is based on the radiological appearance of the trabecular bone structure of the proximal femur on a plain antero-posterior (AP) radiograph. The purpose of this study was to compare between Singh index and bone mineral density measurement using dual energy X-ray abs...
متن کاملارزیابی دقت نرم افزار کنتراست معکوس در رادیوگرافی دیجیتال جهت تشخیص شکستگی عمودی ریشه دندان (in vitro)
Background and Aims: Diagnosis of vertical root fractures often poses a clinical dilemma. Diagnosis of VRF in intraoral radiographs, except in cases where the beam is perpendicular to the direction of fracture is difficult. Misdiagnosis often leads to wrong decisions about the design of teeth future treatment plan. The aim of this study was to determine the diagnostic accuracy of reverse cont...
متن کاملComparison of Double and Single Leg Weight-Bearing Radiography in Determining Knee Alignment
Background: Knee malalignment is an important modifiable cause of osteoarthritis (OA). Surgical therapeutic procedures depend on proper knee alignment assessment. The purpose of this study was to compare knee alignment parameters between double and single leg weight-bearing radiographs and to evaluate the reproducibility of inter- and intra-observer measurements. Methods: One hundred eight p...
متن کاملبررسی مقایسهای دقت Cone beam CT، رادیوگرافی داخل دهانی و پروب پریودنتال در اندازهگیری ضایعات استخوانی پریودنتال
Background and Aims: Cone beam computed tomography (CBCT) produces high-quality data about diagnosis and periodontal treatment. To date, there is not enough research regarding periodontal bone measurement using CBCT. The aim of this study was to compare the accuracy of CBCT in measuring periodontal defects to that of intraoral radiography and probing methods.Materials and Methods: Two-hundred a...
متن کاملHow to Assess Inter-Observer Reliability of Ratings Made on Ordinal Scales: Evaluating and Comparing the Emergency Severity Index (Version 3) and Canadian Triage Acuity Scale
An exact, optimal (“maximum-accuracy”) psychometric methodology for assessing inter-observer reliability for measures involving ordinal ratings is used to evaluate and compare two emergency medicine triage algorithms—both of which classify patients into one of five ordinal categories. Ten raters independently evaluated the identical set of 200 patients, five with each algorithm. Analysis reveal...
متن کامل